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Abstract: We show that in studies of light quark- and gluon-initiated jet discrimi¬ 
nation, it is important to include the information on softer reconstructed jets (asso¬ 
ciated jets) around a primary hard jet. This is particularly relevant while adopting 
a small radius parameter for reconstructing hadronic jets. The probability of having 
an associated jet as a function of the primary jet transverse momentum (p^) and 
radius, the minimum associated jet pt and the association radius is computed up to 
next-to-double logarithmic accuracy (NDLA), and the predictions are compared with 
results from Herwig++, Pythiab and PythiaS Monte Carlos (MC). We demonstrate 
the improvement in quark-gluon discrimination on using the associated jet rate vari¬ 
able with the help of a multivariate analysis. The associated jet rates are found to be 
only mildly sensitive to the choice of parton shower and hadronization algorithms, 
as well as to the effects of initial state radiation and underlying event. In addition, 
the number of kt sub jets of an anti-Zcf jet is found to be an observable that leads 
to a rather uniform prediction across different MC’s, broadly being in agreement 
with predictions in NDLA, as compared to the often used number of charged tracks 
observable. 
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1 Introduction 

Hadronic jets are the most abundant objects at a proton-proton collider like the LHC, 
and it is a major challenge to separate the signals being looked for from standard 
model (SM) backgrounds in multijet final states. One promising direction that has 
recently received attention in both theoretical and experimental studies is that the 
separation of light quark-initiated jets from gluon-initiated ones can be viable in 
these search channels. Quarks are often encountered in the decays of new particles 
predicted by scenarios beyond the standard model, as well as in the decay of the weak 
bosons, Higgs and top quark in the SM itself. On the other hand, in the corresponding 
SM backgrounds involving multiple hard jets, there is a larger fraction of gluon- 
initiated jets from QCD radiation. Here, quark- or gluon-initiated jets (henceforth 
simply referred to as quark and gluon jets) refer to the parton in the hard process 
at leading order in perturbation theory that initiates the parton shower. Based 
on the difference in the radiation pattern of quarks and gluons, a likelihood based 
discriminant can be built to separate decay jets from QCD radiation jets with a 
certain efficiency [1]. 

Several variables have been proposed to separate quark and gluon jets, mostly 
relying on the fact that a gluon of similar energy leads to more soft emissions com¬ 
pared to a quark. This includes both discrete variables like the number of charged 
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tracks inside the jet cone, as well as continuous ones like the width of a jet and 
energy-energy-correlation (EEC) angularity [1-5]. ATLAS and CMS collaborations 
have also studied the discrimination of light quarks from gluons along these lines with 
the 7 and 8 TeV LHC data respectively [6, 7]. Using data samples with "enriched 
quark and gluon content", data-based taggers were also developed, and compared 
to the predictions from Monte Carlo (MC) simulations. While there are differences 
between the predictions of different MC’s, as well as between the data-based tagger 
and the MC results, they are consistent with each other within the large systematic 
uncertainties at present. 

An important question in this regard is the proper choice of a jet algorithm and 
radius parameter. In the LHC environment, in order to keep the contribution of the 
underlying event and multiple proton-proton collisions at a minimum, for multijet 
processes the standard choice is an anti-Zc^ algorithm with radius parameter R = 0.4. 
In addition, in the ATLAS study mentioned above, jets are required to satisfy an 
isolation criterion: a jet is considered isolated if there is no other reconstructed jet 
within a cone of size Ai? < 0.7 (where Ai? = is the standard 

distance measure in the pseudorapidity-azimuthal angle plane). An optimum choice 
for the jet radius parameter was discussed in Refs. [8, 9] for quark and gluon jets as 
a function of their transverse momenta (pt), and it was observed that one usually 
requires a larger radius for a gluon jet in order for the parton pt to be close to the jet 
Pt- However, for experimental purposes it is advantageous to use a fixed and small 
radius parameter for the jets, for reasons mentioned above. Therefore, we propose 
to recover the missed information on radiation from the parent parton outside the 
chosen jet radius by including softer reconstructed jets that can be present (with a 
calculable probability) around a certain radius of a primary hard jet. These softer 
jets are referred to as "associated jets" in this study. It is important to note here 
that imposing an isolation criterion as above while studying quark and gluon jet 
properties might not be appropriate, since it leads to rejecting a fraction of the jet 
candidates beforehand, and thus biasing the sample to ones where the initial quark 
or gluon has not radiated outside the adopted jet radius. 

We first compute the associated jet rates in QCD to next-to-double logarithmic 
accuracy in Sec. 2, and then compare the analytical results with those from different 
parton shower MC’s in Sec. 3. Using the information on the presence (or absence) of 
associated jets can improve the discrimination of quarks and gluons. We demonstrate 
this through a multivariate analysis in Sec. 4. Several combinations of jet discrim¬ 
ination variables are tried out, and an attempt is made to determine an optimum 
choice. Even though we include standard discrimination variables like the number of 
charged tracks as inputs to our multivariate analysis, it should be emphasized that 
they are subject to MC ambiguities stemming from parton shower algorithms and 
their associated parameters, and tunings of hadronization and underlying event (UE) 
models. However, in order to judge the improvement in tagger performance on using 
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the associated jet rates, we compare the performance of different sets of variables 
within the same MC. 

In Secs. 5 and 6 we study the use of the number of sub jets of a jet (defined 
with an exclusive kt algorithm) in place of the number of charged tracks, since the 
different MC prediction tend to be similar for the former observable. We compute 
the subjet rates upto NDLA as well, and compare the NDLA results with predictions 
from different MC’s. Our results on both associated jets and subjets are summarized 
in Sec. 7. We discuss the 2-dimensional joint distributions of the three discrimination 
variables used as inputs in the multivariate analysis in an Appendix. 


2 Associated jet rates: analytical calculations 

To begin with, let us define the longitudinally invariant jet algorithms [10-13] adopted 
in this study. The distance measures between each pair of objects i and j (dp), and 
between an object and the beam (dj^) are given by 

dp = > 

diB =Pu > (2-1) 


where pu, yi and 0* are the transverse momentum, rapidity and azimuth of object i, 
respectively, ARfj = {Vi — yjY + {(t>i — ‘i>iY^ and R is the jet radius parameter. The 
jet algorithm in use is fixed by the parameter p, with p = 1,0,—1 for the kt [11], 
Cambridge/Aachen [14, 15] and anti-fcj [13] algorithms, respectively. At any stage 
of clustering, if a dp is the smallest measure we combine objects i and j. If diB is 
the smallest we call i a jet and remove it from the clustering list. This procedure is 
continued until there are no more objects left to cluster. 

Once a primary jet j has been defined, say using the anti-fc^ algorithm with radius 
parameter R, we define a nearby jet i with pp > pa > pa and R < Ai?p < Ra as an 
associated jet. Thus the associated jet rates are functions of the primary jet pt = pj, 
its radius R, the association radius Ra and the minimum associated jet pt = Pa- In 
Fig. 1 we illustrate the idea of an associated jet schematically, and show the relevant 
variables that determine the associated jet rate. 

In perturbative QCD, the rate of n-jet production from a primary object of 
type i [i = q.,g m this case), Rj^, can be obtained from the associated generating 
function [16-19] 

■!>,(«)= R>”. (2.2) 

n 

We can recover the jet rates by differentiating at u = 0, 


1 d^^ 
/?* = — ^ 

” “ n! du^ 


u=0 


(2.3) 
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Ptj > Pti 



Figure 1. A schematic illustration of associated jets, and the relevant variables which 
determine the associated jet rate (see text for details). 

The jet rates R\ = are functions of the trigger jet transverse momentum 

Pj^ and the evolution scale for parton showering, which, for hadron-hadron collisions 
is taken as (^ = Ai?^/2. This is equivalent to the evolution scale for coherent parton 
showering, ^ = 1 —cos0, with 9 being the emission angle {AR‘^/2 ~ 9‘^/2 ~ 1 —cos0). 
To be resolved, an emission must have ^ = R^/2 and pt > Pa- Since the jet 

rates R}^ include the trigger jet j, the probability of n associated jets for a jet of type 
i with transverse momentum pj is 

K = K+iiPi.U)- ( 2 . 4 ) 

Here, with Ra being the association radins defined above. 

The generating functions $,(«) were computed in the context of e+e" collisions 
in Ref. [16], upto next-to-double logarithmic accuracy (NDLA). Here, leading double 
and next-to-double logarithms refer to «§ log^"^ and «§ log^”~^, where the logarithms 
are those of Ra/R and/or pj/pa- For pa sufficiently large, these terms are determined 
by the timelike showering of final-state partons, while contributions from initial-state 
showers and the underlying event can be avoided. Following the same methods as in 
Ref. [16] for hadron hadron collisions, for and Pj > Pa, we have the quark and 

gluon generating functions to NDLA 

^q{u,Pj,0 = U+ [ dz^^^^Pgq{z)^q{u,Pj,P)[^g{u,ZPj,P) -1] , 

^9{U,Pj,0 = U+ [ dZ^^^^{Pgg{z)^g{U,Pj,P)[^g{U,ZPj,P) -1] 

hj ^ Jpa/Pi 27r 

+ Pqg{z) [{%{U,Pj,ar-%{U,Pg,P)]} . (2.5) 

Here, the running coupling is evaluated at the transverse momentum scale of the 
emission, k/ = z'^p'jp. Defining as = as{p‘jC)/'^, he. in terms of the coupling at the 


4 







hard scale, we have to NDLA 


as(fct) 


= «s — boal 


2 In 2 : + In 
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with bo = (IIC*^ — 2n/)/12. 

The solution for the quark generating function is easily seen to be 


^q{u,Pj,0 = uexp 






( 2 . 6 ) 


(2.7) 


We can solve for the gluon generating function by iteration, and then substitute 
in this equation to get the complete solution. For brevity we define the following 
logarithms: 


n = \iv{pj/pa) , k' = hl{zpj/pa) , 

A = ln(^„/e,) = 2 HRJR) , A' = ln(r/^,) . 


( 2 . 8 ) 


In terms of these variables the NDLA quark generating function is 


^q{u, K, X) = uexp dX' J dfi:'(«', A', K, A) ['I>g(u, k', A') — 1] I (2.9) 


where, including the full Pgg splitting function, 


Fg {k', a', k, a) = CFO'S 

Defining similarly^ 

Fp(k', A',k, A) = Ca^s 


1 _e«'-« + 2e2(«'-«) 


1 

-( 

2 


- Cpboal [2(k' - k) + a' - A] . 


I _ 102(k'-k) _ jj:g3(K'-K] 


( 2 . 10 ) 


— CAboal [2 {k — k) + X' — X] , 


r,(K',K) = ^as 


k' — R C2^{R'' — r) _|_ 2^^3{r' — K,) 


( 2 . 11 ) 


we solve the gluon generating function by iteration to second order in u, which gives 
the probabilities for 0 or 1 associated jets: 


d>p(u, K, A) = uAg(«;, A)<|^1 + u J [Fg(K\ A\ k. A) Ag(K\ A') 

+Tf{K', K)Af{K, X')] +C^('lf^)'i; (2-12) 


^We keep terms that are formally power-suppressed in order to satisfy the boundary condition 
Po = 1 when pa = Pj. 

^We drop the (S| term in F/ as it is beyond NDLA and does not affect the boundary condition. 
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where Aq(K,A) and Ag{K,X) are the quark and gluon Sudakov factors (the proba¬ 
bilities for no associated jets) and we have defined A/(k,A) = A^(k, A)/Ag(K, A). 
Hence 


= exp< —Cpc^sX 


— Cpb^oi^KX 


= Ag{K, A) = exp <{ - / dX' / dK' A', k, A) 

Jo Jo 

3 -K 1 -2k 

K -he-e 

4 4 

Pq = Ag{K, A) = exp \ — I dX' f dK' \rg{K', X', k, A) -h F f{K', k)] 


k + ^A 


= exp< —CachsX 


'0 
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ttsA 


2 2 

f _e-« + e-2«-^e- 
3 3 

rX PK 


— Cyi^oc^s^A 


K + -A 


P," = A,(k, A) / d\' £iK'r,(K',A',K,A) A5 (k',A') 


(2.13) 


(2.14) 

(2.15) 


PA PK 

Pf = Ag{K, A) / dX' / dn' [Tgin', A', k, X)Ag(K', A') + r/(K', k) A/(k, A')](2.16) 
Jo Jo 


3 Associated jet rates: comparison with Monte Carlo 

We are now in a position to compare the NDLA predictions for associated jet rates 
discussed in the previous section with the results obtained using the Herwig++ [20] 
and PythiaS [21] event generators where the quark- and gluon-initiated jets are 
simulated using the Z + q and Z + g processes at leading order in QCD (with the Z 
boson subsequently decayed to up). The event samples were generated for proton- 
proton collisions at the 13 TeV LHC, using the CTEQ6L1 [22] parton distribution 
functions (PDF) for the Pythia generators and the default MRST LO** [23] PDF and 
UE model for Herwig++. Subsequently, we used a modified version of DELPHES2 [24] 
for including detector effects. For observables based on charged tracks to be discussed 
in the following, we use a minimum pt threshold of 1 GeV for each track. All jets 
are reconstructed with an anti-Zcj algorithm [13, 25] with radius parameter R = 0.4, 
and are required to have pr > 20 GeV. In addition, the leading jet is required to be 
central with \p\ < 2. 

In Fig. 2 we show the probability of obtaining n associated jets as a function 
of the jet Pt for n = 0,1 and n > 1, for quark- and gluon-initiated jets, in the 
left and right columns respectively. The association radius is set to be = 0.8 
and the minimum associated jet transverse momentum is Pa = 20 GeV. In the MG 
simulations, Pn has been computed as a function of priJs), which is the vector sum 
of the leading jet and associated jet pps. The jet rates are studied as a function of 

^To be specific, we use Herwig++ 2.7.0 and Pythia 8.201 (tune 4C) for all our calculations. 


6 















PtUs) [GeV] ptUs) [GeV] 

Figure 2. Comparison of the Herwig++ and PythiaS MC predictions for associated jet 
rates with the NDLA results, as a function of prijs)- for quark jets (left), and gluon jets 
(right), with Ra = 0.8 and pa — 20 GeV. Here, prUs) is the vector sum of the leading jet 
and associated jet pr's. 

PriJs), as it is closer to the transverse momentum of the parton that initiates the 
final state shower. 

We see that the functional behaviour with respect to the jet pp in the MC com¬ 
putation ^ and the NDLA calculation are similar, although there are some differences 
in the values of P„. In particular, the MC prediction of Pi for quark and gluon jets 
is higher than the NDLA result, especially at higher PtUs), with Herwig++ giving 
rise to a slightly larger Pi compared to PythiaS. For a quark jet, the probability 
of having at least one associated jet ranges from around 15% to 25% as we go from 
PtUs) = 200 GeV to prUs) = 500 GeV and at higher prUs) the probability essen¬ 
tially remains the same. For gluon jets, the corresponding probability ranges from 
around 30% to 40% as we go from PtUs) = 200 GeV to PtUs) = 500 GeV. The larger 
probability to have an associated jet around a gluon can thus be utilized to better 
discriminate it from quarks, as we shall see in the next section. 

The NDLA computation includes only the time-like showering of the hnal state 
partons, and ignores some power-suppressed effects due to momentum conservation 
and hadronization. On the other hand, the MC results shown above include momen¬ 
tum conservation and hadronization as well as the effects of initial state radiation 
(ISR) and multiple interaction (MPI). In order to quantify the effect of ISR and MPI, 
we compare the predictions for with and without ISR and MPI in Herwig++, 

^For the associated jet rate calculations, we generated MC event samples with a statistics of 
20,000 events each fixing the threshold for the minimum leading jet pr at 50 x (i -|-1) GeV, for 
i € [0,19]. Only events with the leading jet prijs) above the generation threshold are used in the 
analysis. This ensures uniform MC statistics in the whole range of PtUs)- 
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Figure 3. Comparison of the Herwig++, PythiaS and PythiaG predictions for associated 
jet rates with and without ISR and MPI, as a function of pr(js)- foi* quark jets (left), and 
gluon jets (right). Here, prijs) is the vector sum of the leading jet and associated jet p^’s. 


PythiaS as well as in PythiaG [26] (we use the version Pythia 6.4.28 with the 
AUET2B-CT6L tune) in Fig. 3. It is clear from this figure that the impact of ISR and 
MPI is rather small for our choice of the association radius Ra = 0.8, thereby making 














































the predictions stable against such effects. For this choice of we can see that 
PythiaS shows the highest variation against such effects, followed by PythiaG, while 
the effects are indeed negligible for the case of Herwig++ Furthermore, the MC 
results become closer to the NDLA ones when ISR and MPI effects are switched off. 

We also investigated the effects of momentum conservation, by changing the 
recombination scheme in the anti-fc^ jet algorithm from the default Fl-scheme to the 
“winner-take-all” scheme introduced in [27], which is less sensitive to recoils in the 
parton shower [28]. Such a change increases the MC associated jet rates very slightly. 
We believe this is because the axis of the leading jet is moved away from the overall 
momentum vector of the system. The effects are roughly proportional for quark and 
gluon jets, so they would not affect discrimination significantly. 

4 Quark-gluon separation: multivariate analysis 

4.1 Variables for quark-gluon separation 

A large number of variables have been surveyed in the context of quark-gluon dis¬ 
crimination, constructed out of either track based observables or calorimeter based 
ones [1-5]. While the former category has the practical advantage of being more 
accurate due to better track momentum resolution as well as being less prone to 
pile-up contamination, the latter category can be used for jets with larger rapidities 
outside the tracker coverage. The most widely studied variables include the number 
of charged tracks inside the jet cone (rich), the jet width [1] and energy-energy- 
correlation (EEC) angularity [4]. The jet width is defined as 

PT,i X AR{i, 

PT,i 

where the sum goes over all the tracks associated to the jet. A similar track-based 
EEC variable, denoted by can be defined as 

M0) ^ 12iT.jPT,i X Ptj X {AR{i,j)y 

^ ^ iE.PT,r 

Here again the sum over i and j run over all the tracks associated to the jet with 
j > b while /5 is a tunable parameter. It has been demonstrated in Ref. [3, 4] 
that smaller values of the exponent /3 leads to a better quark-gluon separation, and 
(3 = 0.2 is found to be optimal from perturbative calculations and MC studies based 
on Herwig++ and PythiaS generators. We have compared the performance of the jet 
width variable w and the EEC variable in the multivariate analyses (MVA) 

to be discussed below, and find that in all cases leads to a better separation 

^However, we have checked that if we take a larger association radius, Ra > 1.2, the ISR effects 
become appreciable in Herwig++. 


4.2) 



jet) 


(4.1) 
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of gluons from quarks. Therefore, in the following, we only show results based on 
rich (with each charged track having pr > 1 GeV) and . In addition, we shall 

include the associated jet information as well as the jet mass variable and compare 
the performance of the different MVA methods. As seen in the previous section, for 
n = 1 or n > 1, the probability of hnding n associated jets. Pm is signihcantly larger 
for gluon jets compared to quark-initiated ones across the whole pt range of interest. 
Therefore, the presence (or absence) of an associated jet within a certain distance 
Ra of a high-pr jet can be used to further improve the separation. 

As the boundary between the signal and background regions in the hyper-surface 
spanned by the variables is non-linear, it is beneficial to adopt a multivariate analysis 
strategy as compared to a cut-based one. For this purpose, we employed a Boosted 
Decision Tree (BDT) algorithm with the help of the TMVA-Toolkit [29] in the ROOT 
framework. The training of the classifier was performed with Z-\-q—]ei and Z-\-g—]ei 
samples, and we generated the above MC samples uniformly distributed in jet-pr 
The input variables for the two variable training are taken to be rich and 
while for three-variable trainings we further include the variable mj/pT,j, where mj is 
the jet mass and pr,j is the transverse momentum of the leading jet. The information 
on the number of associated jets is included in the form of two categories (n = 0 or 
n > 1) in the MVA. 

It should be emphasized that the MC prediction of the discrimination variables, 
especially the number of charged tracks rich is quite sensitive not only to the parton 
shower (PS) algorithm adopted and the related parameters, but also to the tuning 
of the hadronization and underlying event models. This is expected, since rich is not 
an infrared safe quantity, and only the ratio converges rather slowly 

to the ratio of the colour factors Ca/Cf for high jet pt [30]. The disagreement 
between different MC’s can therefore be reduced only by appropriate tuning at the 
LHC energies. With this limitation of the MC predictions in view, in this study, we 
compare the performance of different MVA methods within the same MC generator 
to estimate the improvement in adding associated jet related observables. We also 
show the quark-gluon separation as predicted by the different MC’s for comparison. 
In Appendix A we present details of the distributions of the discrimination variables 
and the differences between the MC predictions for them. 

4.2 Performance in MVA 

Based on the BDT analysis, we obtain the efficiencies of tagging quark (e^) and gluon 
jets (eg) as a function of the cut on the BDT score. It is more useful to compare the 
ratio of the tagging efficiencies as a function of e^, in order to judge the separation 
power of a "quark-rich signal" from a "gluon-rich" background. In Figs. 4-6 (left 

®The MC event samples for the training of the classifier were generated in the same manner as 
for the associated jet rate computation in the previous section, but with a smaller step size of 10 
GeV for the minimum prijs) thresholds. 
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column) we show the ratio of the quark and gluon tagging elRciencies, eg/e^ as a 
function of Cg, for 400 < PtUs) < 500 GeV, with the event samples generated with 
all the three MC codes. Four different MVA methods are shown corresponding to 
different choices for the discrimination variables: 

• Method-1: Two variables, rich and Ci with P = 0.2. 

• Method-2: Two variables, rich and Ci with /3 = 0.2, with two categories 
determined in terms the number of associated jets (n = 0 or n > 1). 

• Method-3: Three variables, rich, C\ with P = 0.2 and mj/pT,j. 

• Method-4: Three variables, rich, C\ with P = 0.2 and mj/pT^j, with two 
categories determined in terms the number of associated jets (n = 0 or n > 1). 




Figure 4. The ratio of the quark and gluon tagging efficiencies, Cg/cg as a function of Cg, 
for 400 < PtUs) < 500 GeV, as determined by MC simulations with Herwig++ (left column). 
The different MVA methods, determined in terms of the input variables are explained in 
the text. To quantify the improvement in quark gluon separation as we go to Methods 2,3 
and 4, we show eg(Method-l)/eg(Method-{2,3,4}) as a function of eg as well (right column). 

We can quantify the improvement in quark-gluon separation using eg(Method- 
l)/eg(Method-{2,3,4}) as a function of eg, as shown in Figs. 4-6 (right). For e.g., for 
an operating point of eg = 0.4, we can obtain an improvement of around 10%, 15% 
and 20% using Methods-2,3 and 4 respectively, when compared to Method-1. The 
differences between the improvement factors obtained using the three MC’s are found 
to be small. 
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Figure 5. Same as Fig. 4, with MC simulations using PythiaS. 




Figure 6. Same as Fig. 4, with MC simulations using Pythia6. 


In order to estimate the change in tagger performance as we consider lower px 
jets, we show in Fig. 7 the same results as in Fig. 4, but now with 150 < Prijs) < 200 
GeV. The improvement on adding associated jet rates is still appreciable, although 
it is somewhat reduced compared to the higher px range. The fluctuations in the eg 
ratio for lower values of in Fig. 7 are due to low MC statistics. 
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€q : quark efficiency 


Figure 7. Same as Fig. 4, for a lower range of jet pT, 150 < pxijs) < 200 GeV. Results 
using only Herwig++ are shown. 


We can see in Figs. 4-6 that there is an improvement in going from a two vari¬ 
able analysis to a three variable one by including the variable mj/pT,j- This can 
be understood as follows. The jet mass variable is related to as can be seen 

by writing both of them in terms of the z, 9 variables for the hardest emission in¬ 
side the jet cone: mj ~ z(l — z)9^p^j. Furthermore, and are two 

independent variables belonging to the Ci class which carry all the information on 
this hardest emission, and including both of them improves the tagger performance. 
For this reason, further addition of a third variable in the Ci class does not change 
the performance appreciably, a fact that we explicitly checked by a separate MVA 
analysis. There is a further improvement in the quark-gluon separation when the 
number of associated jets information is included at the level of categories in both 
the two and three variable MVA analyses. Since the associated jet rates carry the 
additional information of radiation outside the jet cone, Methods 2 and 4 lead to 
further improvements as compared to Methods 1 and 3, respectively. 

Method 4 leads to the best performance out of the four different MVA’s consid¬ 
ered. In fact, we find that there is an alternative way to include the associated jet 
rates information in Method 4 by using the modified jet mass variable m{js)/pT,j in 
Method 3. Here, m{js) is the jet mass computed by adding the leading jet and asso¬ 
ciated jet four momenta. Because of a larger associated jet rate, for the same PtOs), 
m{js) is higher for a gluon jet compared to a quark, while pt,j is lower. Therefore, 
using either associated jet rate categories and mj/pT,j, or using only the variable 
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m{js)/pT,j leads to the same MVA performance, as shown in Fig. 8. 



Figure 8. Comparison of Method 4 which includes mj/pT^j and the associated jet rates 
as categories in the MVA, and the alternative method of including the associated jet rate 
information by using the modified jet mass variable m{js)/pT,j- Both methods lead to the 
same MVA performance. 


5 Subjet rates in jets: analytical calculations 

The number of charged tracks inside a jet cone, rich, (with each track having transverse 
momentum above a threshold, usually taken to be around 1 GeV) is often used as a 
good discriminating variable. However, as mentioned earlier, the MC predictions for 
this observable are quite sensitive not only to the parton shower (PS) algorithm and 
the related parameters, but also to the tuning of the hadronization and underlying 
event models. On the otherhand, we hnd that the number of subjets of a primary jet 
leads to a more uniform prediction across the MC’s, and thus can be better suited in 
quark gluon separation studies. The number of sub jets as a quark-gluon separation 
variable was considered earlier in Ref. [1]. In this study, we compute the subjet rates 
to NDLA accuracy, and show a detailed comparison with different MC’s. 

We find the subjets of jet j with the exclusive kt algorithm, which applies the 
dimensionless distance measure 


Vik = mm{pl,pfk} 


RVj ’ 


(5.1) 
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to its constituent objects and clusters them as discussed for a generalized kt algorithm 
in Sec. 2, until the smallest yik is above r/cut- Thus the subjet rates are functions of 
the jet pt = pj, the jet radius R, and Pcut- 

In this section, we compute the subjet rates to NDLA, i.e. considering double 
and next-to-double logarithms, and where now L = ln(l/?/cut)- The 

relevant generating functions in this case are those given in Refs. [10, 19]: 


(l)giu,Q) = uAg{Q)exp 


(t)g{u,Q) = uAg{Q)exp ( / dq 

\jQo 


f dqrg{Q,q)(pg{u,q)\ , 

Qo / 

^ r (j) (ij n\^ 

Tg{Q,q)<l^g{u,q)^Tf{qf^^ 


(f)g{u,q) 


(5.2) 

(5.3) 


where Q = Rpj is the jet scale, Qo = cut resolution scale, ^ 


r,(Q,g) 

^giQ^d) 

r/(g) 


2CFas(q^) A Q 3 g 1 \ 

TT q q Q AQy ’ 

2C^Qs(g^) A ^ _ 11 , ^ 

TT g q 12 4Q2 + 6Q3^’ 

nf Qs(9^) _ 3^ 3^ _ 

37r g V 2 g 2 g^ q^) ' 


The Sudakov factors for no resolvable emission are now 


Ag{Q) = exp 
Ag{Q) = exp 


rQ 

>Qo 

rQ 




-[ <* 9 lUW, 9 ) + r,(«)| 

JQo 


(5.4) 

(5.5) 

(5.6) 


(5.7) 

(5.8) 


Hence the rates for 1, 2 or 3 subjets in a quark jet are: 

Rl = A,(Q) , 

Rl = Ag{Q)[ dqTg{Q,q)Ag{q), 

J Qo 

Rl = Ag{Q)[ dq [ dg'rg(Q,g)Ag(g) X 

J Qo J Qo 

{[r,(e, ?') + u),, <,')! A,(«') + r,(«')A^((,')}. ( 5 . 9 ) 


^Here again we keep power-suppressed corrections in order to satisfy boundary conditions. 


15 









where Af — A^/Ag, and for a gluon jet we have 


Rl 

Rl 




rQ 


Ag{Q) / dq[rg{Q,q)Ag{q) + Tf{q)Af{q)] , 
J Qo 


rQ 


dq' 


Vg{Q,q)Ag{q) X 


Ag{Q) / dq 

^ Qo ^ Qo 

{[rg(g, g') + rg(g, q')] Ag{q') + r;(g')A;(g')} + rf{q)Af{q) x 
{[rg(g, q') - Tgiq, q')] Ag{q') + 2Tg{q, q')Ag{q')} 


(5.10) 


6 Subjet rates in jets: comparison with Monte Carlo 

We now compare the above results with Monte Carlo predictions. MC samples of 
quark and gluon jets were prepared for the sub jet analysis using the same setup 
as in the associated jet study in Sec. 2, however, detector effects and minimum pt 
cuts for the charged and neutral hadrons were not included for this analysis. In this 
sense, our study of the subjet rates should be taken as illustrative, and we do not 
include the sub jet rates in an MVA analysis in this paper. As we shall see in the 
following, one needs to go down to at least L = 4 to have some discrimination power. 
This corresponds to going down to 0.1 for AR resolution, which is the typical size of 
calorimeter cells, although the AR separation of sub jets would be larger when the 
subjet Pt is smaller compared to the primary jet pr- Therefore, in a proper analysis, 
combining track and calorimeter information is essential, and a detailed experimental 
study is necessary, which is beyond the scope of this paper. 

Figure 9 shows comparisons between the resummed results of Eqs. (5.9,5.10) 
and the MC results for jets with px^j G [500, 600] GeV and R = 0.4. For quark 
jets the different MC’s agree quite well with each other and with the resummed 
calculations, the MC predictions being somewhat below the resummed 1-sub jet rate 
for L > 4, and vice-versa for 2 subjets. Hadronization effects are small for L < 7, 
after which the 1- and 2-subjet rates are suppressed and the higher subjet rates are 
therefore enhanced. At this value of Rprj, L = 7 corresponds to resolving sub jets 
with mm{pti,ptj}ARij ~ 6 GeV. 

For gluon jets the agreement between the resummed results and the Monte Carlos 
is still quite close for 1 subjet. For 2 and 3 subjets the peak rates are in roughly the 
same place but have higher values than the resummed ones, with the effect that the 
rate for 4 or more sub jets is substantially suppressed. Once again the hadronization 
effects are small for L < 7, after which the 1- and 2-subjet rates are suppressed and 
the higher subjet rates are enhanced, actually bringing the latter into close agreement 
with the analytical calculations. 
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L = -InCjeut) 


L = -InCjcut) 


Figure 9. Subjet rates Rn with n = 1, 2,3 and n > 3 as a function of L = —In(ycut), for 
quark jets (black) and gluon jets (red), with pT,j G [500,600] GeV, R = 0.4. Curves are 
Herwig++ (dashed), Pythia6 (dot-dashed), PythiaS (dotted) and NDLA resummed (solid). 


In conclusion, the fairly good agreement between the Monte Carlos and the 
resummed 1-, 2- and 3-subjet rates for R = 0.4 and L not too large (L < 5, subjet 
resolution above about 15 GeV) suggests that in this range those sub jet rates can 
be used for quark-gluon discrimination. At larger jet radii, the agreement remains 
similar, as we have checked using i? = 0.8. 

7 Summary 

To summarize our findings, we show that in studies of light quark and gluon jet 
separation at the LHC, it is important to include the information on associated jet 
rates around a primary hard jet. Associated jet rates are defined as the probability of 


17 

























finding at least one softer reconstructed jet around the primary hard jet under con¬ 
sideration. This probability is found to be substantially higher for a gluon-initiated 
jet compared to a quark-initiated one. Since commonly a small jet radius param¬ 
eter is adopted in LHC studies of hadronic jets, the associated jet rates carry the 
information on the radiation outside the chosen jet radius. 

We compute the associated jet rates up to NDLA accuracy in perturbative QCD, 
as a function of the primary jet and minimum associated jet pr’s, as well as the jet ra¬ 
dius and association radius parameters. The NDLA results are thereafter compared 
with predictions from different parton shower MC’s. Since the NDLA predictions 
include only the time-like showering of the final state partons, we demonstrate the 
effects of ISR and MPI in the MC predictions as well, and it is observed that the 
NDLA predictions are closer to the MC’s when ISR and MPI are switched off. Over¬ 
all, the associated jet rates are not very sensitive to these effects as long as the 
association radius is not too large. 

The probability of having at least one associated jet for a primary gluon jet is 
roughly a factor of two larger than for a quark jet, with a small variation in this 
number as a function of the jet pr. This fact makes the presence or absence of asso¬ 
ciated jets a good variable for quark-gluon discrimination studies. We demonstrate 
the impact of including the associated jet rate information by including this vari¬ 
able in an MVA analysis, along with the well-studied variables of number of charged 
tracks, energy-energy-correlation angularities and jet mass. Comparing different two 
and three variable MVA’s with and without the associated jet information, we find 
that including the associated jets leads to an improvement of around 10% in rejecting 
gluons, for a fixed quark selection efficiency of 0.4. We also show that using a three 
variable MVA with associated jet categories leads to the best performance, with an 
improvement of 20% in rejecting gluons, for the same quark efficiency as above. 

Since for the number of charged tracks variable the MC predictions tend to differ, 
and are dependent on the parton shower and underlying event parameter tunes, we 
explore the number of kt subjets of an anti-fcj jet as a quark-gluon separation variable. 
We compute the number of subjets to NDLA accuracy, and compare the resummed 
predictions with different MC’s. The different MC predictions are found to be rather 
uniform, with the resummed predictions being broadly in agreement with them. 
However, for gluon jets the peak rates for 2 and 3 subjets are found to be lower in 
the resummed computation, which might arise due to higher-order effects that are 
in general bigger for gluons. 
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A Distributions of discrimination variables 
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Likelihood: Q/(Q+G) 
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Figure 10. Joint distributions of nch and c[^ in Herwig++ and PythiaS, for quark 
and gluon jets with priJs) ^ [400, 500] GeV having nAjet = 0 and > 1 associated jets. 


In Figs. 10-12 we show 2-dimensional plots of the joint distributions of the three 
discrimination variables used in the MVA presented in Section 4, for the two Monte 
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Figure 11. Joint distributions of rich and mj/px^j in Herwig++ and PythiaS, for quark 
and gluon jets with pxijs) G [400, 500] GeV having riAjet = 0 and > 1 associated jets. 

Carlo event generators Herwig++ and PythiaS. The following features may be ob¬ 
served: 

• There are differences between the distributions predicted by the two Monte 
Carlos, those of PythiaS being somewhat narrower for quark jets and substan¬ 
tially narrower for gluon jets. 

• The distributions of the infrared-unsafe variable rich show the greatest differ- 
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Quark jet 


Gluon jet 


Likelihood: 0/(0+G) 
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Figure 12. Joint distributions of and rtij/pr^j in Herwig++ and PythiaS, for 

quark and gluon jets with prijs) G [400, 500] GeV having riAjet = 0 and > 1 associated jets. 


ences, with those of PythiaS being larger at high rich- This could be due to 
differences in tuning of the non-perturbative parameters of the generators. 

• The above features are reflected in the likelihood plots, showing the probability 
ratio Pq/{Pq+Pg), and account for the higher discrimination efficiency predicted 
by PythiaS (Fig. 5 vs Fig. 4). 

• The quark-gluon discrimination in the events with associated jets is weaker 
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than that for nAjet = 0. This is expected because the events are selected 
according to Prijs)^ the sum of leading and associated jet px^s. Therefore 
those with associated jets have leading jets with lower pr’s, which have lower 
discriminating power. 

• Nevertheless the inclusion of the associated jet category improves the MVA 
performance, because the probability of an associated jet is lower for quark 
jets. 
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